Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new collector exposing 'ksmd' stats #165

Merged
merged 1 commit into from
Jan 21, 2016
Merged

Add new collector exposing 'ksmd' stats #165

merged 1 commit into from
Jan 21, 2016

Conversation

pborzenkov
Copy link
Contributor

Add new collector which exposes the content of /sys/kernel/mm/ksm
directory. This directory contains control and statistics files for
Kernel Samepage Merging daemon.

This is useful to monitor hosts which run KVM hypervisor as this info provides more deep understanding of what happens to the memory subsystem on such hosts.

The collector is not enabled by default.

@matthiasr
Copy link
Contributor

Could you also add fixtures, a few tests, and wire it into the end-to-end test? For the latter, add the collector to the list of collectors in the script, and run the script with -u to update the result fixture.

prometheus.CounterOpts{
Namespace: Namespace,
Subsystem: ksmdSubsystem,
Name: "full_scans",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, counters should be suffixed by _total

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brian-brazil While this is definitely a recommended practice per Prometheus docs, I thought that node_exporter was some kind of "special" exporter and followed the rule "do not change underlying OS metric name". And a quick search through the sources confirmed this thought.

If this is not true anymore and for new collectors '_total' suffix should be used for counters regardless of the underlying metrics name then, sure, I will fix that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#150 covers fixing this generally, you should add _total where it's practical to do so and clear that the metric is a counter.

@pborzenkov
Copy link
Contributor Author

@matthiasr Yeah, sure. I'll add some tests.

@pborzenkov
Copy link
Contributor Author

@matthiasr I've added ksmd collector to end-to-end test. Regarding the unit tests - I'm not sure they are needed. There is no complex structure in those sysfs files, they are just a bunch of files exporting one integer value each.

@SuperQ
Copy link
Member

SuperQ commented Jan 8, 2016

Ping, it looks like this needs a rebase.

@pborzenkov
Copy link
Contributor Author

Rebased onto master.

@RichiH RichiH mentioned this pull request Jan 13, 2016

var (
ksmdFiles = []string{"full_scans", "merge_across_nodes", "pages_shared", "pages_sharing",
"pages_to_scan", "pages_unshared", "pages_volatile", "run", "sleep_millisecs"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this collector already explicitly defines the list of files/metrics to export, I wonder whether we should also go for seconds instead of millisecs here? Wdyt @brian-brazil?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be seconds. It looks like we can also convert pages to bytes as this feature doesn't support huge pages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will rename sleep_millisecs to sleep_seconds. Regarding the conversion of pages to bytes - I'm note sure that this is a good idea. First of all, ksmd operates exclusively on pages. And while having metrics like pages_shared, pages_sharing and so on in bytes look good (one might want to see real memory savings in bytes), converting pages_to_scan (which is a setting, not a metric) to bytes just looks weird and confusing.

If someone wants to have bytes instead of pages, this can easily be done with Prometheus query.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We convert pages to bytes where possible everywhere else, so that the users don't have to figure out how to convert from a multitude of different units to get bytes.

It's probably best to exclude pages_to_scan, that doesn't seem like something useful to export as a metric. We'd normally only expose settings if they were a limit, so that you can calculate how full something is, or something that's common to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably best to exclude pages_to_scan, that doesn't seem like something useful to export as a metric.

Actually, it is useful. It's constantly being adjusted by ksmtuned daemon (on RHEL-based systems), depending on current memory pressure. So it's useful to know this setting's value to correlate it with CPU usage and rate at which pages are being shared.

@grobie
Copy link
Member

grobie commented Jan 21, 2016

👍 Thanks and sorry for the long delay. I'll wait for @brian-brazil's comment to my question and include this in the next release (expect it this week).

@matthiasr
Copy link
Contributor

👍 for tests as they are. E2E will be enough to catch regressions.

Add new collector which exposes the content of /sys/kernel/mm/ksm
directory. This directory contains control and statistics files for
Kernel Samepage Merging daemon.

The collector is not enabled by default.

Signed-off-by: Pavel Borzenkov <pavel.borzenkov@gmail.com>
grobie added a commit that referenced this pull request Jan 21, 2016
Add new collector exposing 'ksmd' stats
@grobie grobie merged commit d1f0f22 into prometheus:master Jan 21, 2016
@pborzenkov pborzenkov deleted the ksmd-collector branch January 21, 2016 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants